logo of company

Bioinformatics pipeline summary


Where we see the pipeline processes

Author: Adrien Taudière

Date: October 29, 2024

Summary of the bioinformatic pipeline

Code
library(knitr)
library(targets)
library(MiscMetabar)
here::i_am("analysis/01_bioinformatics.qmd")
Code
d_pq <- clean_pq(tar_read("d_vs", store=here::here("_targets/")))
Cleaning suppress 2 taxa and 1 samples.
Code
summary_plot_pq(d_pq)
Cleaning suppress 0 taxa and 0 samples.

Code
tar_glimpse(script=here::here("_targets.R"), targets_only = TRUE, callr_arguments = list(show = FALSE))
Code
tar_meta(store=here::here("_targets/"), targets_only = TRUE) |> 
  dplyr::mutate(time = paste0(seconds %/% 3600,":",seconds %/% 60,":",floor(seconds %% 60)))|>
  dplyr::select(name, seconds, bytes, format, time) |>
  dplyr::mutate(Gb=round(bytes/10^9,2)) |>
  dplyr::arrange(desc(seconds), desc(bytes))  |> 
  kable()
name seconds bytes format time Gb
tax_tab 2233.722 130811 rds 0:37:13 0.00
ddR 1506.865 170093780 qs 0:25:6 0.17
ddF 1191.481 159773594 qs 0:19:51 0.16
quality_raw_seq 844.965 21139 rds 0:14:4 0.00
cutadapt 752.728 69632 file 0:12:32 0.00
quality_seq_wo_primers 557.035 20867 rds 0:9:17 0.00
filtered 550.641 164 rds 0:9:10 0.00
quality_seq_filtered_trimmed_FW 307.363 13258 rds 0:5:7 0.00
quality_seq_filtered_trimmed_REV 307.321 13258 rds 0:5:7 0.00
derep_rs 201.935 1713022114 qs 0:3:21 1.71
derep_fs 180.589 1207693426 qs 0:3:0 1.21
err_rs 134.503 17212 qs 0:2:14 0.00
merged_seq 126.202 1376856 qs 0:2:6 0.00
err_fs 112.515 16268 qs 0:1:52 0.00
track_by_samples 97.301 10170 rds 0:1:37 0.00
quality_seq_filtered_trimmed_REV_plot 70.716 11369583 rds 0:1:10 0.01
build_website 70.214 44 rds 0:1:10 0.00
quality_seq_filtered_trimmed_FW_plot 70.160 11369583 rds 0:1:10 0.01
bioinfo_report 57.386 44 rds 0:0:57 0.00
quality_seq_wo_primers_plot 3.805 765212 rds 0:0:3 0.00
seqtab_wo_chimera 3.447 298108 rds 0:0:3 0.00
quality_raw_seq_plot 1.479 424231 rds 0:0:1 0.00
d_vs 0.898 409383 rds 0:0:0 0.00
d_vs_mumu 0.758 395883 rds 0:0:0 0.00
seq_tab_Pairs 0.352 362296 rds 0:0:0 0.00
d_vs_mumu_rarefy 0.235 362059 rds 0:0:0 0.00
track_sequences_samples_clusters 0.097 376 rds 0:0:0 0.00
s_d 0.075 131725 rds 0:0:0 0.00
data_phyloseq 0.073 496389 rds 0:0:0 0.00
data_raw 0.022 5255 rds 0:0:0 0.00
asv_tab 0.012 293416 rds 0:0:0 0.00
seqtab 0.005 295766 rds 0:0:0 0.00
sam_tab 0.004 121606 rds 0:0:0 0.00
file_refseq_taxo 0.000 1022856 file 0:0:0 0.00
file_sam_data_csv 0.000 470237 file 0:0:0 0.00
fastq_files_folder 0.000 65536 file 0:0:0 0.00
data_fnfs 0.000 2733 rds 0:0:0 0.00
data_fnrs 0.000 2731 rds 0:0:0 0.00
samp_n_otu_table 0.000 1129 rds 0:0:0 0.00

Load phyloseq object from targets store

Code
d_pq <- tar_read("d_vs", store=here::here("_targets/"))

The {targets} package is at the core of this project. Please read the intro of the user manual if you don’t know {targets}.

The {targets} package store … targets in a folder and can load (tar_load()) and read (tar_read) object from this folder.

Sample data

Code
DT::datatable(d_pq@sam_data)

Sequences, samples and clusters across the pipeline

Session Information

Session information are detailed below. More information about the machine, the system, as well as python and R packages, are available in the file data_final/information_run.txt .

Code
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] MiscMetabar_0.10.1 purrr_1.0.2        dplyr_1.1.4        dada2_1.32.0      
[5] Rcpp_1.0.13        ggplot2_3.5.1      phyloseq_1.48.0    targets_1.8.0     
[9] knitr_1.48        

loaded via a namespace (and not attached):
  [1] bitops_1.0-9                deldir_2.0-4               
  [3] permute_0.9-7               rlang_1.1.4                
  [5] magrittr_2.0.3              ade4_1.7-22                
  [7] matrixStats_1.4.1           compiler_4.4.1             
  [9] mgcv_1.9-1                  png_0.1-8                  
 [11] callr_3.7.6                 vctrs_0.6.5                
 [13] reshape2_1.4.4              stringr_1.5.1              
 [15] pwalign_1.0.0               pkgconfig_2.0.3            
 [17] crayon_1.5.3                fastmap_1.2.0              
 [19] backports_1.5.0             XVector_0.44.0             
 [21] labeling_0.4.3              utf8_1.2.4                 
 [23] Rsamtools_2.20.0            rmarkdown_2.28             
 [25] UCSC.utils_1.0.0            ps_1.8.0                   
 [27] xfun_0.48                   cachem_1.1.0               
 [29] zlibbioc_1.50.0             GenomeInfoDb_1.40.1        
 [31] jsonlite_1.8.9              biomformat_1.32.0          
 [33] highr_0.11                  rhdf5filters_1.16.0        
 [35] DelayedArray_0.30.1         Rhdf5lib_1.26.0            
 [37] BiocParallel_1.38.0         jpeg_0.1-10                
 [39] parallel_4.4.1              cluster_2.1.6              
 [41] R6_2.5.1                    bslib_0.8.0                
 [43] RColorBrewer_1.1-3          stringi_1.8.4              
 [45] jquerylib_0.1.4             GenomicRanges_1.56.2       
 [47] SummarizedExperiment_1.34.0 iterators_1.0.14           
 [49] IRanges_2.38.1              Matrix_1.7-0               
 [51] splines_4.4.1               igraph_2.1.1               
 [53] tidyselect_1.2.1            abind_1.4-8                
 [55] yaml_2.3.10                 vegan_2.6-8                
 [57] codetools_0.2-20            hwriter_1.3.2.1            
 [59] processx_3.8.4              lattice_0.22-6             
 [61] tibble_3.2.1                plyr_1.8.9                 
 [63] Biobase_2.64.0              withr_3.0.1                
 [65] ShortRead_1.62.0            evaluate_1.0.1             
 [67] survival_3.7-0              RcppParallel_5.1.9         
 [69] Biostrings_2.72.1           pillar_1.9.0               
 [71] BiocManager_1.30.25         MatrixGenerics_1.16.0      
 [73] DT_0.33                     renv_1.0.11                
 [75] foreach_1.5.2               stats4_4.4.1               
 [77] generics_0.1.3              rprojroot_2.0.4            
 [79] S4Vectors_0.42.1            munsell_0.5.1              
 [81] scales_1.3.0                base64url_1.4              
 [83] glue_1.8.0                  tools_4.4.1                
 [85] interp_1.1-6                data.table_1.16.2          
 [87] GenomicAlignments_1.40.0    visNetwork_2.1.2           
 [89] rhdf5_2.48.0                grid_4.4.1                 
 [91] ape_5.8                     crosstalk_1.2.1            
 [93] latticeExtra_0.6-30         colorspace_2.1-1           
 [95] nlme_3.1-165                GenomeInfoDbData_1.2.12    
 [97] cli_3.6.3                   fansi_1.0.6                
 [99] S4Arrays_1.4.1              gtable_0.3.5               
[101] sass_0.4.9                  digest_0.6.37              
[103] BiocGenerics_0.50.0         SparseArray_1.4.8          
[105] farver_2.1.2                htmlwidgets_1.6.4          
[107] htmltools_0.5.8.1           multtest_2.60.0            
[109] lifecycle_1.0.4             here_1.0.1                 
[111] httr_1.4.7                  secretbase_1.0.3           
[113] MASS_7.3-61                

Citation

BibTeX citation:
@online{taudière2024,
  author = {Taudière, Adrien},
  title = {Bioinformatics Pipeline Summary},
  date = {2024-10-29},
  langid = {en}
}
For attribution, please cite this work as:
Taudière, Adrien. 2024. “Bioinformatics Pipeline Summary.” October 29, 2024.